29 research outputs found
Backtracking Spatial Pyramid Pooling (SPP)-based Image Classifier for Weakly Supervised Top-down Salient Object Detection
Top-down saliency models produce a probability map that peaks at target
locations specified by a task/goal such as object detection. They are usually
trained in a fully supervised setting involving pixel-level annotations of
objects. We propose a weakly supervised top-down saliency framework using only
binary labels that indicate the presence/absence of an object in an image.
First, the probabilistic contribution of each image region to the confidence of
a CNN-based image classifier is computed through a backtracking strategy to
produce top-down saliency. From a set of saliency maps of an image produced by
fast bottom-up saliency approaches, we select the best saliency map suitable
for the top-down task. The selected bottom-up saliency map is combined with the
top-down saliency map. Features having high combined saliency are used to train
a linear SVM classifier to estimate feature saliency. This is integrated with
combined saliency and further refined through a multi-scale
superpixel-averaging of saliency map. We evaluate the performance of the
proposed weakly supervised topdown saliency and achieve comparable performance
with fully supervised approaches. Experiments are carried out on seven
challenging datasets and quantitative results are compared with 40 closely
related approaches across 4 different applications.Comment: 14 pages, 7 figure
Sampling-based image and video matting without compositing equation
Image and video matting play a fundamental role in image and video editing applications.
They are generally classified into α-propagation based approaches and color sampling
based approaches. The correlation between neighboring pixels with respect to local image
statistics is leveraged to interpolate the known alpha values to the unknown regions in
α-propagation methods. In color sampling methods, foreground (F ) and background
(B) samples from known regions that represent the true colors of the unknown pixels
are used to estimate alpha. Complex color distributions of foreground and background
regions, highly textured edges, and unavailability of true F and B samples are some of
the main challenges faced by current works. In addition to this, sampling methods have
traditionally followed the compositing equation using (F, B) pairs for alpha estimation.
When extended to videos, the unavailability of user-defined trimaps in each frame and
the additional requirement of temporal coherency across the sequence makes the matte
extraction process a highly challenging task. We aim to develop novel natural matting
algorithms for both images and video that can alleviate the drawbacks faced by current
methods in generating a good quality matte. We achieve the objectives through the
following contributions.
First, a sampling-based image matting algorithm is proposed that utilizes sparse
coding in the image domain to extract an alpha matte. Multiple F and B samples, as
opposed to a single (F, B) pair is used to describe the color at a blended pixel. A carefully
chosen dictionary made up of feature vectors from the F and B regions, refined through
a foreground probability map ensures that the constrained sparse code coefficients can
be approximated to the alpha value. Experimental evaluations on a public benchmark
database show that our method achieves state-of-the-art results.
Second, a new video matting algorithm is proposed which uses a multi-frame graph-
ical model to ensure temporal coherency in the extracted matte. The alpha value at a pixel needs to be consistent and smooth across the video sequence for better tempo-
ral coherence. This is accomplished by simultaneously solving for the alpha mattes for
multiple consecutive frames. An objective function is proposed that can be solved in
closed-form as a sparse linear system. An adaptive temporal trimap propagation using
motion-assisted shape blending is utilized to propagate the trimaps automatically be-
tween the key-frames. Experimental evaluations on an exclusive video matting dataset
validates the effectiveness of the method.
Third, a new sampling-based video matting algorithm is proposed that reinterprets
the matting problem from the perspective of sparse reconstruction error of F and B
samples. Sampling methods generally select an (F, B) pair that produces the least recon-
struction error. The significance of the error has been left unexamined. Two patch-based
frameworks are used to ensure temporal coherency in the video mattes - a multi-frame
non-local means framework using coherency sensitive hashing and a patch-based multi-
frame graph model using motion. Qualitative and quantitative evaluations indicate the
performance of the method in reducing temporal jitter and maintaining spatial accuracy
in the video mattes.Doctor of Philosophy (SCE
Sparse codes as alpha matte
In this paper, image matting is cast as a sparse coding problem wherein the sparse codes directly give the estimate of the alpha matte. Hence, there is no need to use the matting equation that restricts the estimate of alpha from a single pair of foreground (F) and background (B) samples. A probabilistic segmentation provides a confidence value on the pixel belonging to F or B, based on which a dictionary is formed for use in sparse coding. This allows the estimate of alpha from more than just one pair of (F, B) samples. Experimental results on a benchmark dataset show the proposed method performs close to state-of-the-art methods.Published versio